Using 5 ms segments in concatenative speech synthesis

نویسندگان

  • Toshio Hirai
  • Seiichi Tenpaku
چکیده

A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for concatenative speech synthesis. The proposed method is applied to the speech database CMU ARCTIC, and 100 sentences synthesized. Though the synthesized speech maintains the speaker’s identity and is natural enough, it also has some noises caused by inappropriate unit selection, and the formant changes are awkward in some vowel regions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis

, – Spectrum at each segment boundary for calculation of concatenation cost (2) Synthesis stage – Text-to-Feature •Generate features from input text (linguistic/prosodic-information) – Feature-to-Speech • Find the N-best candidates in each frame (preselection) according to segment's target cost • Find the best path from the N-best candidates based on concatenation cost •Concatenate the segments...

متن کامل

Synthesis Units for Conversational Speech - Using Phrasal Segments -

This paper describes the use of phrase-sized segments for the concatenative synthesis of conversational speech and discusses the differences in selection criteria that become necessary when the source corpus contains several years of conversational speech samples. It claims that naturalsounding conversational speech can be reproduced by use of such phrase-sized chunks for concatenation, and tha...

متن کامل

Databases of Heterogeneous Segments for Concatenative Speech Synthesis

Heterogeneous segments can enhance the quality of concatenative speech synthesis especially for highly inflected languages. In this paper we present a brief analysis of the segment types on a general level and discuss the problems related to optimising databases of heterogeneous segments. We present a brief discussion of the algorithmical complexity for the proposed approach and offer some heur...

متن کامل

On the Detection of Discontinuities in Concatenative Speech Synthesis

Last decade considerable work has been done in finding an objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis. Speech segments in concatenative synthesis are extracted from disjoint phonetic contexts and discontinuities in spectral shape and phase mismatches tend to occur at unit boundaries. Many feature sets —most of them of spectral na...

متن کامل

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units

Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004